A Framework for Distributed Cleaning of Data Streams

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

QueueLinker: A Framework for Parallel Distributed Processing of Data Streams

With the development of computer systems, many more devices are being connected to the network and generating ‘data stream.’ Analyzing data streams in real-time offers valuable information about human activities and contributes to many information services. QueueLinker enables programmers to build data stream processing applications by implementing application modules that use a producer–consum...

متن کامل

A Framework for Clustering Evolving Data Streams

The clustering problem is a difficult problem for the data stream domain. This is because the large volumes of data arriving in a stream renders most traditional algorithms too inefficient. In recent years, a few one-pass clustering algorithms have been developed for the data stream problem. Although such methods address the scalability issues of the clustering problem, they are generally blind...

متن کامل

A New Framework for Data Streams Classification

Mining data streams has recently become an important and challenging task for a wide range of services, including credit card fraud detection, sensor networks and web applications. In these applications data do not typically take the form of persistent relations, but tend to arrive in multiple, continuous, rapid and timevarying data streams. Hence, conventional knowledge discovery tools cannot ...

متن کامل

A Framework for Data Cleaning in Data Warehouses

It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when perfor...

متن کامل

Models for Distributed, Large Scale Data Cleaning

Poor data quality is a serious and costly problem affecting organizations across all industries. Real data is often dirty, containing missing, erroneous, incomplete, and duplicate values. Declarative data cleaning techniques have been proposed to resolve some of these underlying errors by identifying the inconsistencies and proposing updates to the data. However, much of this work has focused o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Procedia Computer Science

سال: 2015

ISSN: 1877-0509

DOI: 10.1016/j.procs.2015.05.156